03 Nesterov accelerated gradients (NAG)

Nesterov’s accelerated gradient (NAG) is a momentum-based optimizer. It attempts to mitigate vanilla SGD with momentum’s tendency to overshoot the optimum.

NAG proceeds in two steps. First, it computes a set of “look-ahead parameters” using regular old gradient descent:

The momentum term is based on the change in these look-ahead parameters:

In other words, the momentum reflects the rate of change in parameters if they had continued to descend as they did at time . If they were going to overshoot, then the momentum term will tend to cancel out the velocity term.

David's raw ML reference notes

Explorer

03 Nesterov accelerated gradients (NAG)

Graph View

Backlinks